-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GROMOS11 Reader #4292
GROMOS11 Reader #4292
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello there first time contributor! Welcome to the MDAnalysis community! We ask that all contributors abide by our Code of Conduct and that first time contributors introduce themselves on the developer mailing list so we can get to know you. You can learn more about participating here. Please also add yourself to package/AUTHORS
as part of this PR.
Linter Bot Results:Hi @JoStoe! Thanks for making this PR. We linted your code and found the following: Some issues were found with the formatting of your code.
Please have a look at the Please note: The |
Please have a look at my comment in issue #3743 (comment) The tests for this reader use assert_allclose() instead of assert_almost_equal() |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## develop #4292 +/- ##
===========================================
+ Coverage 93.37% 93.41% +0.04%
===========================================
Files 170 185 +15
Lines 22340 23626 +1286
Branches 4085 4129 +44
===========================================
+ Hits 20859 22070 +1211
- Misses 963 1036 +73
- Partials 518 520 +2 ☔ View full report in Codecov by Sentry. |
Thanks for this @JoStoe - before we review, is there a format specification we can look at somewhere? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution! I have some initial comments, the most important one being that we need to have a specification of the file format.
Please note that our User Group Meeting is coming up and we are all fairly busy until that's over, so please be patient with reviewers! Thanks.
Reads coordinates, timesteps and box-sizes from GROMOS11 TRC trajectories. | ||
|
||
To load the trajectory into :class:`~MDAnalysis.core.universe.Universe`, | ||
you need to provide topology information using a pdb:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a pdb absolutely necessary or could it be any file that MDA can create a topology from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made no artificial limitation on which source can be used as topology. It is only important, that the order of atoms in the topology is strictly the same as in the trajectory file. As MDAnalysis does not support gromos *.cnf files as well as gromos native topologies, I think all users will use PDB files as topology.
I tested the functionality of the reader only with pdb files as topology, as other topology formats are difficult to obtain starting from a gromos simulation.
Thank you for the fast initial review and your valued feedback. Regarding the documentation, I will get into contact with more senior GROMOS developers to find proper specifications of the trajectory format. For now, I edited the initial PR to show a link to the technical documentation of GROMOS. If I don't find a more handy specification, I will write that on my own. Please note, that in this case I would need a positive sign-off from the GROMOS Maintainers, which may take a month. Regarding the feedback on the code, I will work through the comments and prepare an update to the code. |
Thank you for putting in the effort to give feedback to this PR; I think we are making progress. With the last few commits I hope to have addressed most of the change requests. Most should be marked as resolved; I kept the ones open where I wrote longer answers. I further hope that I did not miss any requests; if so please inform me so I can solve them. Some points that might still be open:
Thank you :) |
Thanks! Re commit history: You can rewrite your history on this commit yourself if you think it fits in a handful (1-3 typically) well-defined commits. Otherwise we will squash it all into a single commit when we merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put some code here that you used to observe/benchmark this behavior? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left some comments, but in general if this is working for you then this would be a great addition as-is. Things can always be iterated on.
if (ext[1:] == "trc") or (ext[1:] == "trj"): | ||
self.compression = None | ||
else: | ||
self.compression = ext[1:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see where else this attribute is used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not used but should have been the documented optional attribute compressed
of the trajectory reader class that might be accessed by the user.
I corrected it from compression
to compressed
The readers XYZ and H5MD also use the attribute compression
. Might be worth to check out, if that is also a typo.
frameDat["time"] = float(tmp_time) | ||
|
||
elif "POSITIONRED" == line.strip(): | ||
tmp_buf = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than creating a list, appending to it, then converting this to an array, I think we've found it's faster to create a np.array of the correct size, pass the strings in to this (numpy converts strings faster than Python). You'll just have to guard against setting more than n_atoms rows of the numpy array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested that out today and have seen different results (at least with the gromos11 file format).
When I pre-allocate a numpy array of size (n_atoms,3) then I have to cast strings to np.float64 every time an atom is added and I need to guard the range of the array to avoid a possible overflow-error. Both are very costly and in the end, it does not yield any performance improvement.
As it is, I fill up the list with strings and convert and check for length only once after reading. I ask to stick with the existing solution.
Overall, the reader is slow. This is likely to to the nature of the compressed, very flexible and human-readable gromos trajectories, which are costly to parse.
@hmacdope could you please have a look at this PR in the next few days and see if any outstanding comments have been addressed? |
@JoStoe please have a careful look at @richardjgowers' comments, which all look like things that are easy to address to improve code robustness. Please also resolve conflicts (as this will put the PR into a stage where we can merge quickly once everybody is happy). Do feel free to ping me if you're waiting on anything specific to happen. |
@orbeckst I tried to address/answer to the recommendations in the comments above. Regarding the merge conflicts, may I ask you to look through the pull requests if there are any that are likely to be merged before this PR? It seems to me there are some that might get merged very soon (#4366, #4368, #4370) which do changes to the |
@JoStoe - I can very much sympathise that merge conflicts are a lot of effort to keep on top of. However, in order for your code to get reviewed you need to ensure that CI can pass. If there's a direct merge conflict for any files you touched then CI will be blocked by it. As far as I'm aware, none of the PRs you point to are touching files you have touched except the changelog (which is a relatively low cost merge conflict, at worst we can deal with that one ourselves). |
The amazing @JoStoe has addressed all my comments, the only open question being conversion of trunc-oct boxes which we will probably address down the line. With tests passing this should be good to merge from my POV. Thanks for all your work @JoStoe! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wahooo 🥳
Thank you for the good feedback :) I still added my github handle to the CHANGELOG, I think this is usually done - correct me if I'm wrong. |
It's merged, congratulations to your first merged PR, @JoStoe 🎉 , that was a sizable one! And thank you very much for your professional (and patient) interactions with everyone on the PR, with very diligent attention to detail. We hope to see more of you :-). I'll also send you a "contributor" invite for the MDA org. |
Fixes #4291
Changes made in this Pull Request:
It can read timestep information, box-sizes and atom-coordinates.
You can find more information about the GROMOS software here: https://www.gromos.net/
Technical details about GROMOS data structures and formats can be found here:
https://gromos.net/gromos11_pdf_manuals/vol4.pdf
The relevant documentation of the GROMOS11 trajectory format, used in this PR, can be found in chapter 2 and 4 (page number 3 and 37-56).
The trajectory format organizes its data in blocks, which allows it to carry data depending on the task of the simulation. This PR implements the blocks TIMESTEP, GENBOX and POSITIONRED, which should cover the vast majority of trajectories generated with the GROMOS11 software. In a non-supported block is encountered, a warning is served to the user and the data in the block is ignored.
\Edit: Updated on 2023-11-16 to give more information about the trajectory format
PR Checklist
Developers certificate of origin
📚 Documentation preview 📚: https://mdanalysis--4292.org.readthedocs.build/en/4292/